AutoTune: Optimizing Execution Concurrency and Resource Usage in MapReduce Workflows

نویسندگان

  • Zhuoyao Zhang
  • Ludmila Cherkasova
  • Boon Thau Loo
چکیده

An increasing number of MapReduce applications are written using high-level SQL-like abstractions on top of MapReduce engines. Such programs are translated into MapReduce workflows where the output of one job becomes the input of the next job in a workflow. A user must specify the number of reduce tasks for each MapReduce job in a workflow. The reduce task setting may have a significant impact on the execution concurrency, processing efficiency, and the completion time of the worklflow. In this work, we outline an automated performance evaluation framework, called AutoTune, for guiding the user efforts of tuning the reduce task settings in MapReduce sequential workflows while achieving performance objectives. We evaluate performance benefits of the proposed framework using a set of realistic MapReduce applications: TPC-H queries and custom programs mining a collection of enterprise web proxy logs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Cost optimized provisioning of elastic resources for application workflows

Workflow technologies have become a major vehicle for easy and efficient development of scientific applications. In the meantime, state-of-the-art resource provisioning technologies such as cloud computing enable users to acquire computing resources dynamically and elastically. A critical challenge in integrating workflow technologies with resource provisioning technologies is to determine the ...

متن کامل

Adaptive workflow scheduling for dynamic grid and cloud computing environment

Effective scheduling is a key concern for the execution of performance-driven grid applications such as workflows. In this paper, we first define the workflow scheduling problem and describe the existing heuristicbased and metaheuristic-based workflow scheduling strategies in grids. Then, we propose a dynamic critical-path-based adaptive workflow scheduling algorithm for grids, which determines...

متن کامل

FMEM: A Fine-grained Memory Estimator for MapReduce Jobs

MapReduce is designed as a simple and scalable framework for big data processing. Due to the lack of resource usage models, its implementation Hadoop hands over resource planning and optimizing works to users. But users also find difficulty in specifying right resource-related, especially memory-related, configurations without good knowledge of job’s memory usage. Modeling memory usage is chall...

متن کامل

Special Issue for Emerging Computational Methods for the Life Sciences Workshop

This paper surveys the contents of the special issue on Emerging Computational Methods for the Life Sciences Workshop with six contributed papers. They cover a rich variety of topics on interface of life sciences and computation which in detail are parallelizing two popular micro array data analysis techniques using the Simple Parallel R Interface (SPRINT); parallelization of PEMer structural v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013